Tangent-3 at the NTCIR-12 MathIR Task

نویسندگان

  • Kenny Davila
  • Richard Zanibbi
  • Andrew Kane
  • Frank Wm. Tompa
چکیده

We present the math-aware search engine Tangent-3 and report its results for the NTCIR-12 MathIR task. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Solr), and 2) a query-by-expression engine. We use an inverted index to store math expressions using pairs of symbols extracted from a Symbol Layout Tree representation built from Presentation MathML. We use a cascade model with two stages for retrieval. In the first stage, relevant expressions are retrieved quickly using iterator trees over posting lists to find matches and expressions are ranked using the Dice coefficient of matched symbol pairs. In the second stage, the top-k best candidates are reranked with a more strict similarity metric supporting unification and wildcard matching. Our system produces relevant (and partially relevant) Precision@5 values of 21% (50%) for the main arXiv task, 25% (49%) for the Main Wikipedia subtask and 45% (84%) for the Wikipedia Formula Browsing subtask.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTCIR-12 MathIR Task Overview

We present an overview of the NTCIR-12 MathIR Task, dedicated to information access for mathematical content. The MathIR task makes use of two corpora. The first corpus contains excerpts from technical articles in the arXiv, while the second corpus contains English Wikipedia articles. For each corpus, there were two subtasks. Three subtasks contain queries with keywords and formulae (arXiv-main...

متن کامل

Exploring the One-brain Barrier: A Manual Contribution to the NTCIR-12 MathIR Task

This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...

متن کامل

MCAT Math Retrieval System for NTCIR-12 MathIR Task

This paper describes the participation of our MCAT search system in the NTCIR-12 MathIR Task. We introduce three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification. We find that these modules, except the cold-start weights, have a very good impact on the search performance of our s...

متن کامل

Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies

This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising on...

متن کامل

Exploring the One-brain Barrier: a Manual Contribution to the NTCIR-12 Math Task

This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016